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ABSTRACT 

'  This  paper  presents  a  procedure  for  processing  real  world 

image  sequences  produced  by  relative  tr ans  lat i ona 1  motion 
between  a  sensor  and  environmental  objects  In  this 

procedure<  the  determination  of  the  direction  of  sensor 
translation  is  effectively  combined  with  the  determination  of 
the  displacements  of  image  features  and  environmental  depth. 
It  requires  no  restrictions  on  the  direction  of  motion.  nor 
the  location  and  shape  of  environmental  objects.  It  has  been 
applied  successfully  to  real-world  image  sequences  from 

several  different  task  domains. 

The  processing  consists  of  two  basic  steps:  Feature 

Extraction  and  Search .  The  feature  extraction  process  picks 
out  small  image  areas  which  may  correspond  to  distinguishing 
parts  of  environmental  objects.  The  direction  of 
translational  motion  is  then  found  by  a  search  which 

determines  the  image  displacement  paths  along  which  a  measure 
of  feature  mismatch  is  minimized  for  a  set  of  features  The 
correct  direction  of  translation  will  minimize  this  error 
measure  and  also  determine  the  corresponding  image 

displacement  paths  for  which  the  extracted  features  match 
well.v 


1.  INTRODUCTION 


This  paper  develops  a  procedure  for  processing  real  world 
image  sequences  from  a  translating  sensor.  The  procedure  is 
not  dependent  on  an  initial  matching  process  for  finding  image 
displacements  before  inferring  environmental  depth  and  camera 
motion.  Instead*  the  determi nati on  of  the  direction  of  sensor 
translation  is  combined  with  the  determination  of  the 
d i sp lacements  of  image  features  and  of  environmental  depth 
No  restrictions  on  the  shape  of  environmental  objects  are 
required.  The  procedure  has  been  applied  to  real  world  image 
sequences  under  several  different  operating  conditions  with 
robust  performance. 

The  processing  consists  of  two  basic  steps:  Feature 

Extraction  and  Search .  The  feature  extraction  process  picks 
out  small  image  areas  which  may  correspond  to  distinguishing 
parts  of  environmental  objects.  The  direction  of 

translational  motion  is  then  found  by  a  search  which 
determines  the  image  displacement  paths  along  which  a  measure 
of  feature  mismatch  is  minimized  for  a  set  of  features  The 
correct  direction  of  translation  will  minimize  this  error 
measure  and  also  determine  the  corresponding  image 
displacement  paths  for  which  the  extracted  features  match 
well. 

The  feature  extraction  processi  which  is  presented  in  section 


two.  finds  distinctive  points  along  image  contours  determined 
by  simple  processes  which  are  sensitive  to  image  edges.  It 
finds  features  which  are  positioned  at  points  of  high 
curvature  along  these  contours.  The  technique  utilizes 
contours  determined  by  zero-crossing  extraction  and  image 
thresholding. 

The  search  process.  which  is  presented  in  section  three, 
minimizes  a  simple  error  measure.  This  error  measure  is 
defined  with  respect  to  a  unit  sphere  with  each  point  on  the 
sphere  corresponding  to  a  different  direction  of  sensor 
translation.  A  given  direction  of  translation  constrains  the 
motion  of  extracted  image  features  to  straight  lines  which 
radiate  from  or  converge  onto  a  single  point.  The  error 
measure  thus  associates  with  a  point  on  the  unit  sphere, 
corresponding  to  a  particular  translational  axis.  a  number 
describing  the  total  extent  of  feature  mismatch  along  the 
displacement  paths  determined  bg  the  translational  axis. 
Experiments  have  shown  this  error  measure  to  be  smooth  and 
with  a  distinct  minimum  in  a  large  neighboorhood  about  the 
correct  trans lat i ona  1  axis.  Because  of  this.  the  search 
process  can  be  quite  simple. 

The  fourth  section  presents  several  experiments  showing  the 
results  of  applying  the  procedure  in  several  different 
situations.  The  experiments  indicate  that  the  procedure  is 
very  robust  and  applicable  to  a  wide  range  of  real  world  image 


sequences. 
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The  fifth  section  discusses  various  aspects  of  the  procedure 
and  outlines  some  applications  and  extensions 


1.  1.  Coordinate  Sustem 


The  camera  model  consists  of  a  planar  retina  embedded  in  a 
three-dimensional  Cartesian  coordinate  system  (X.Y. Z)<  with 
the  origin  at  the  focal  point  and  the  optical  a  *  i  s  aligneil 
with  the  positive  Z-axis  (figure  1).  The  X  and  Y  axes 
correspond  to  the  gravitationally  intuitive  horizontal  and 
vertical  directions  respectively.  The  image  plane  is  parallel 
to  the  XY  plane  and  at  some  distance  along  the  Z  axis 
Positions  in  the  image  plane  are  described  using  a  2-d 
coordinate  system  aligned  with  the  X  and  Y  axes  of  the  camera 
coordinate  system  and  with  the  origin  determined  by  the 
intersection  of  the  image  plane  and  the  Z-axis. 


The  axes  of  translation  are  unit  vectors  based  at  the  origin 
of  the  camera  coordinate  system  and  are  described  by  two 
angles  (Phil<  Phi2>  (figure  2).  For  a  unit  vector,  V,  based 
at  the  origin,  Phil  is  the  angle  between  the  (0,1,0)  vector 
and  the  edge  determined  by  the  intersection  of  the  YZ  plane 
and  the  plane  determined  by  the  X  axis  and  V  Phil  thus 
specifies  a  plane  containing  the  X  axis.  Phi2  is  the  angle 
between  (-1,0,0)  and  V.  Note  that  for  all  angles  a  and  b, 


(a,  0)*(b,  0)  and  (a»PI)  =  (b,PI). 


Camera  Model 

F i gure  1  . 


1  2. 


It  is  necessary  to  have  a  set  of  terms  for  describing  the 
motion  of  features  in  an  image  sequence  and  the  corresponding 
motion  of  environmental  points.  We  define  an  Image 
Di sp lac ement  Vector  to  be  a  two-dimensional  vector  describing 
the  displacement  of  an  image  feature  from  one  image  to  the 
next  An.  Image  Diso  lac  ement  Field  is  the  set  of  image 
displacement  vectors  for  successive  images.  An  Imaae 
Diso lac  ement  Sequence  indicates  the  positions  of  an  image 
feature  over  several  successive  images.  Though  we  are  dealing 
with  discrete  image  sequences)  it  is  often  possible  to  descibe 
the  continuous  curve  along  which  an  image  feature  point  is 
moving  This  curve  is  called  the  Imaa e  Diso lacement  Path . 

Corresponding  to  image  motions  we  use  a  set  of  terms  for 
describing  environmental  motions.  An  Environmental 
Di sp lacement  Field  is  the  set  of  three-dimensional  vectors 
indicating  the  positions  of  environmental  points  at  successive 
instants.  An  Environmenta 1  Diso lacement  Sequence  indicates 
the  position  of  an  environmental  point  over  severa1  successive 
instants.  An  Environmenta 1  Displacement  Path  describes  the 
three-dimensional  curve  that  environmental  points  are  moving 
along  for  particular  motions. 

For  general  camera  motion)  there  are  5  parameters  [PRASO> 
PHA811  that  can  be  lecovered  from  processing  image  motion 
without  knowing  absolute  camera  displacement  or  velocity 
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(sine*  absolute  depth  is  lost):  two  parameters  for  the  unit 
vector  (Tl(t)/  T2(t))  which  describes  the  axis  of 
translational  motion  at  time  t;  two  parameters  for  the  unit 
vector  (R1 ( t ) /  R2<t))  describing  the  axis  of  rotation  at  time 
t;  and  one  parameter  R3(t)  which  describes  the  extent  of 
rotation  about  this  axis  at  time  t  Both  of  these  axes  are 
positioned  at  the  origin  of  the  camera  coordinate  system  The 
problem  of  processing  image  motion  resulting  from  rigid  body 
camera  motion  can  be  organized  into  subcases  of  increasing 
complexity)  corresponding  to  the  number  of  camera  motion 
parameters  that  are  unconstrained. 


For  purely  trans lat i ona 1  motion/  the  image  displacement  paths 
are  determined  by  the  intersection  of  the  trans  1  a t i ona  1  .?xis 
with  the  image  plane.  If  the  trans la t i onal  axis  intersects 
the  image  plane  on  the  positive  half  of  the  axis/  the  point  of 
intersection  is  called  a  Focus  of  Expansion  (FDE)  and  the 
image  motion  is  along  straight  lines  radiating  from  it  Ihis 
corresponds  to  camera  motion  towards  environmental  points  If 
the  translational  axis  intersects  the  image  plane  on  the 
negative  half  of  the  axis/  the  point  is  called  a  Focus  of 
Contraction  (FOC)  and  the  image  displacement  paths  are  alonq 
straight  lines  converging  towards  it.  This  corresponds  to 
camera  motion  away  from  environmental  points  The 
intersections  of  axes  parallel  to  the  image  plane  are  points 
at  infinity  and  are  treated  as  FOEs 
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Given  the  direction  of  translation  and  image  displacements. 

the  relative  depths  of  points  can  be  computed  by  solving  the 

inverse  perspective  transform  CR0G76].  Relative  depth  can 

also  be  inferred  from  the  position  of  a  feature  and  the  extent 

of  its  displacement  relative  to  an  FOE  or  an  FOC.  This 

relation  is  expressed  as 

_D_  _  _Z_ 

AD  AZ 

where  Z  is  the  value  of  the  Z  component  of  an  environmental 
point  at  time  t+1.  del  Z  is  the  extent  of  environmental 
displacement  along  the  Z  axis  from  time  t  to  time  t+1.  D  is 
the  distance  of  the  corresponding  image  point  from  the  FOE  or 
FOC  at  time  t  and  del  D  is  the  image  point's  displacement  from 
time  t  to  time  t  +  1.  Thus,  the  Z  value  of  an  environmental 
point  can  be  recovered  from  image  measurements  in  units  of  del 
Z,  or  what  has  been  termed  T i me-Unt i 1-Contac  t  by  Lee  CLEE76D. 

The  set  of  all  possible  translational  axes  describes  a  unit 
sphere  called  the  trans I  at iona 1  d irection  sphere.  The 
procedures  below  are  defined  with  respect  to  this  sphere, 
rather  than  the  image  plane  itself,  for  reasons  described  in 
section  5.  2.  2. 


1  3.  Previous  and  Related  Wfl.rJl 

GibsonCGIBSO. GIB661  was  among  the  first  to  study  the 
importance  of  the  structure  of  optic  flow  fields  in  the 


control  of  egomotion.  He  also  pointed  out  the  potential 

importance  of  the  FOE  in  tie  translational  case  This  work 
was  extended  by  Purdy  and  by  Lee  CLEE76,  LEE803  Lee  analyzed 
the  computation  of  important  control  information  durinq 

tr ans lat i ona 1  motion/  such  as  t ime— un t i 1  — c on ta c t ,  braking 

information/  and  environmental  depth  in  a  natural  coordinate 
system.  However/  recent  work  by  Beverly  and  Regan  ERELG78. 

REG79]  indicates  that  an  alternative  mechanism  may  be  used  in 
humans  for  determining  the  direction  of  translation  than 
extraction  of  a  FOE  or  FOC . 

Work  in  processing  dynamic  images  CHUA81,  MART 79/  NAGOla 
TH081/  ULL81D  can  be  roughly  divided  into  a  set  of  techniques 
for  determining  the  changes  in  a  sequence  of  images  and  a  set 
for  interpeting  these  transformations  Determining  image 
motion  has  involved  work  in  change  detection,  correlation 
based  matching  techniques  LQUA71,  HAN74,  LEV733,  relaxation 
based  matching  techniques  C  BAR79,  PRA80J,  region  matching 
CNAC78,  RAD81D;  image  differencing  [JAI81J,  and 

spatia'  temporal  analysis  of  the  image  gradient  under 
constraints  of  locally  uniform  motion  CTH0803  and  smoothly 
varying  image  motion  CHOR81,  GLA813 

The  interpetation  of  image  motion  can  result  in  a  variety  of 
descriptions  such  as  determination  of  the  occurrence  and 
location  of  change  CMAR79J.  Ihese  include  image  segmentations 
based  on  common  motion  CTH0803;  the  recovery  of  camera  motion 


parameters  and  the  shape  of  environmental  objects; 


and  more 
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qualitative  descriptions  at  a  level  compatible  with  natural 
language  descriptions  CBAD76.  TS080.  0'RQU803.  Particularly 
relevant  here  is  work  with  the  recovery  of  camera  motion 
parameters  from  an  image  sequence.  Ullman  CULL793 
demonstrated  the  minimal  number  of  observable  points  necessary 
to  obtain  a  solution  over  time  Roach  and  Aggarwal  CR0A80] 
investigated  noise  robust  computations  of  the  camera  motion 
parameters.  Prazdny  developed  techniques  for  decomposing 
optic  flow  fields  CPRA80]  and  displacement  fields  CPRA813  into 
their  rotational  and  tran s la t i ona 1  components.  The  latter 
result  has  been  given  a  particularly  simple  algebraic 
formulation  by  Nagel  CNAGSlbl.  Tsai  and  Huang  CTSA813  have 
investigated  the  simplifications  in  determining  camera  motion 
parameters  by  restricting  the  i nter pr etat i on  to  planar  surface 
patches. 

Williams  CWIL801  was  the  first  to  develop  algorithms  for 
interpreting  complex  natural  images  produced  by  an  optic 
sensor  translating  relative  to  environmental  objects.  This 
work  consisted  of  two  processes:  one  for  inferring  the 
direction  of  translation  given  environmental  depth  information 
and  the  other  for  inferring  depth  given  the  direction  of 
motion  These  processes  used  an  error  measure  describing  the 
consistency  of  depth  information  and  the  inferences  of  feature 
motion  along  image  displacement  paths.  His  work  indicated 
that  these  two  processes^  for  inferring  depth  and  the 
trans la t i ona 1  axis,  could  be  combined. 


The  primary  weakness  of  Williams'  work 


wa  s 


the  necessary 


restriction 

to 

p 1 anar 

surfaces  at 

one 

d  emon  s  tratpd 

orientation 

Addit. ona  1  1  y , 

the 

processing 

1  s 

qu  ite  complex 

when  neither 

env i r  onme  n  t  a  1 

depth  nor 

the 

direction  of 

trans lat i on 

i  s 

known 

— 

l n vo 1 v l n  g 

segmentation, 

resegmentation,  and  coordinating  the  processes  for  inferrinq 
depth  and  for  inferring  the  direction  of  translation  The 
method  presented  here  requires  no  restrictions  on  the 
orientation  of  surfaces  or  shape  of  environment’ll  objects,  and 
involves  only  a  simple  procedure  for  evaluating  an  error 
measure.  It  also  indicates  that  the  direction  of  sensor 
motion  should  be  determined  prior  to,  or  concurrently  with, 
environmental  depth. 

The  determination  of  the  vanishing  point  in  a  static  image  is 
closely  related  to  determining  the  direction  of  translation 
because  the  FOE  is  the  dynamic  analogue  of  the  vanishing 
point.  In  perspective  projection,  parallel  lines  in  the 
environment  map  onto  lines  radiating  from  a  vanishing  point  in 
the  image.  For  trans la t i ona  1  motion,  the  environmental  motion 
paths  correspond  to  the  parallel  lines  in  the  perspective 
case.  Techniques  for  extraction  of  a  vanishing  point  have 
been  developed  by  Kender  CKEN79D  and  Kitihashi  CKIT80J  The 
use  of  the  Hough  transform  in  this  work  is  similar  to  the 
global  sampling  of  the  error  measure  used  below 
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2  FEATURE  EXTRACTION 


The  feature  extraction  process  is  used  to  determine  small 
areas  (sometimes  called  image  points  or  features)  in  an  image 
that  are  distinct  from  neighboring  areas.  This 
distinctiveness  limits  the  potential  matches  of  these  image 
areas#  and  possibly  reflects  a  correspondence  to  actual  and 
significant  points  in  the  environment#  such  as  points  of  high 
curvature  on  object  boundaries,  texture  elements#  surface 
markings,  etc.  (However  some  features#  termed  false  features, 
will  result  from  noise#  occlusion#  and  light  source  effects 
and  have  behavior  which  is  currently  difficult  to  interpet). 
Features  can  be  represented  either  as  arrays  of  numbers 
extracted  directly  from  an  image#  or  as  parameterized  tokens 
describing  local  image  properties.  In  this  paper#  we  refer  to 
features  exclusively  as  small  arrays  of  data  values  centered 
at  some  point  in  an  image  at  some  time  t. 

Following  Moravec  CMOR77.  M0RB03 .  the  method  of  feature 
extraction  used  here  is  based  upon  finding  image  areas  which 
are  significantly  different  than  their  neighboring  areas. 
Using  a  correlation  measure  bounded  between  1  (for  perfect 
correlation)  and  0#  the  d i st i nc t i veness  of  a  feature  is  1 
minus  the  best  correlation  value  obtained  when  the  feature  is 
correlated  with  its  immediately  neighboring  areas  (see  the 
correlation  measures  in  section  3.1.  ).  Selecting  good 
features  then  requires  finding  the  local  maxima  in  the  values 


of  the  distinctiveness  measure  over  an  image 


We  have  extended  this  approach  somewhat  by  constraining  the 
neighborhoods  over  which  the  features  are  selected  to  contours 
determined  by  other  global  processes,  such  as  2 ero-c ross 1 nq 
extraction  and  thresholding,  which  are  sensitive  to  edges 


2.  1.  Feature  Ex trac  t i on  Using  Zero-Cross ing  s 

The  use  of  zero-crossings  to  determine  significant  image 
contours  at  different  levels  of  resolution  has  been  proposed 
and  extensively  studied  by  Marr  et  al.  LH1L80, MAR80J  In 
this  processing  an  image  is  convolved  with  Gauss i an-Lap 1 ac 1  an 
masks  ( ( del )**2g )  of  different  positive  widths  and  thresholded 
at  zero  to  determine  zero-crossing  contours  These  contours 
are  significant  since  they  correspond  to  the  points  of 
greatest  change  in  the  convolved  image.  The  distinctiveness 
measure  can  be  applied  to  points  along  these  contours  in  the 
convolved  image,  with  the  local  maxima  determining  the 
position  of  potential  features.  This  generally  has  the  effect 
of  finding  points  of  high  curvature  along  the  zero-crossing 
contour,  although  points  apparently  corresponding  to  local 
occlusion  vertices  and  weak  maxima  will  also  be  extracted 

Many  of  the  weak  features  which  are  local  maxima  of 
distinctiveness  can  be  removed  by  suppressing  those  which  are 
at  points  of  low  curvature  along  the  zero-crossing  contours 


For  features  which  are  local  d  i  s t inc t i vness  maxima.  we 
approximate  the  curvature  along  the  contour  by  the  inner 
product  of  the  normalized  vectors  describing  the  relative 
positions  of  adjacent  local  maxima  along  the  contour  (figure 
3)  These  values  are  then  thresholded  between  10 
(corresponding  to  high  curvature)  and  -1.0  (corresponding  to 
1  on)  c  urvature  i 

Use  of  zero-crossing  based  features  requires  spec i f i cat i on  of 
the  sizes  of  the  convolution  masks  that  are  employed,  and 
deciding  whether  to  position  extracted  feature  points  with 
respect  to  the  unprocessed  image  or  the  convolved  images.  It 
is  usually  beneficial  to  use  masks  of  various  widths  for 
sensitivity  to  features  at  different  levels  of  resolution. 
The  processing  described  below  can  be  applied  independently  to 
the  pairs  of  successive  images  formed  by  convolving  the 
successive  images  with  two  such  masks.  Alternatively, 
features  can  be  extracted  from  the  original,  unfiltered  image 
at  the  positions  where  features  were  determined  in  the 
convolved  images,  though  experience  with  large  masks  has  shown 
that  this  approach  can  position  features  significant  distances 
from  their  apparent  position  in  the  original  image. 

The  images  in  figure  4a  and  figure  4b  were  taken  from  a 
gyroscop ical  ly  stabilized  movie  camera  held  by  a  passenger  in 
a  car  traveling  down  a  country  road  in  Massachusetts.  They 
are  128x128  pixel  images  with  6  bits  of  resolution  in 
intensity  and  will  be  referred  to  as  the  roadsion  images. 
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Figure  4c  shows  the  zero-crossings  extracted  from  the  initial 
ROADSIGN  image  using  a  (del)**2g  mask  with  a  width  of  5 
pixels  The  distinctiveness  values  were  computed  using 
features  which  were  5x5  pixel  arrays  extracted  from  the 
convolved  image  and  centered  on  pixels  which  were  adjacent  to 
the  zero-crossing  contour  and  of  positive  value  These 
features  were  correlated,  using  Moravec 's  norm  (see  section 
3  1).  with  their  8  immediately  neighboring  features  The 
distinctiveness  measure  for  a  feature  was  set  to  1  minus  the 
best  correlation  obtained  in  its  neighborhood.  excluding 
itself  Figure  4d  shows  the  local  maxima  in  the 
distinctiveness  measure  positioned  with  respect  to  the 
zero-crossing  contour  Note  that  the  features  are  centered  on 
pixels  adjacent  to  the  contour  and  not  on  the  contour  itself. 
Figure  4e  shows  the  results  of  suppressing  low-curvature 
points  using  a  threshold  set  to  -0.8  (corresponding  to  an 
angle  of  143.  13  degrees). 


2.  2  Feature  Ex trac  t i on  Using  Threshold  Contours 

Image  contours  can  also  be  determined  by  thresholding.  The 
values  of  the  threshold  can  be  determined  in  a  variety  of 
ways  such  as  using  fixed  increments.  finding  peaks  and 
valleys  in  the  image  intensity  histogram,  or  using  techniques 
sensitive  to  image  contrast  produced  using  a  particular 


threshold  CK3H81 . WES753 . 
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The  images  in  figure  5a  and  5b  were  produced  from  a  solid 
state  camera  held  by  a  robot  manipulator  translating  toward 
some  industrial  parts  lying  on  a  table.  The  images  were 
initially  512x512  pixel  images  with  7  bits  of  intensity 
resolution  and  were  averaged  down  to  128x128  pixel  images  with 
6  bits  of  intensity  resolution  These  will  be  refered  to  as 
the  industrial  images.  Analysis  of  the  image  intensity 
histogram.  using  the  procedures  described  in  CKOH813. 
determined  a  clear  break  in  the  histogram  at  an  intensity 
level  of  lO  in  the  image.  This  corresponded  to  separating  the 
dark  background  and  the  brighter  objects  in  the  scene.  Figure 
5c  shows  the  extracted  contour  and  figure  5d  the  local  maxima 
in  the  distinctiveness  measure  of  image  features  centered  on 
pixels  adjacent  to  the  contour  and  of  intensity  value  greater 
than  or  equal  to  ten.  Figure  5e  shows  the  extracted  feature 
points  after  low  curvature  suppression  using  a  threshold  set 
to  -08  (corresponding  to  angle  of  143.13  degrees). 


dustrial  Image  1 
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3.  SEARCH  PROCESS 


The  search  process  minimizes  an  error  measure  which  describes 
the  extent  of  mismatch  for  extracted  features  along  the  image 
displacement  paths  determined  by  a  hypothesized  translational 
axis  For  example,  figure  6  shows  an  FOE  determined  by  a 
potential  translational  axis  and  the  image  displacement  paths 
it  determines  for  some  extracted  features.  Also  shown  is  the 
match  profile  for  a  particular  feature  along  a  segment  of  its 
displacement  path  with  respect  to  features  positoned  in  the 
succeeding  image  The  adequacy  of  a  proposed  translational 
axis  is  measured  by  the  strength  of  the  matches  that  the 
extracted  features  have  along  these  paths.  This  suggests 
finding  the  best  match  for  each  feature  along  the  image 
displacement  path  determined  by  a  translational  axis  and  then 
summing  the  extent  of  error  in  these  best  matches  for  the 
error  measure. 

Developing  this  error  measure  requires  a  measure  for  the 
degree  of  match  between  features  and  an  interpolation  process 
for  determining  positions  along  an  image  displacement  path. 
Each  of  these  can  be  implemented  in  various  ways  with  the 
choices  generally  involving  a  trade-off  between  the  speed  of 
evaluating  the  error  measure  and  the  precision  with  which  the 
tr ans  1  at  i  ona  1  axis  can  be  determined. 
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6.  Feature  Displacements  for  a  Potential  Trans  1  at ■ ona 1  Axis 
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J  1  Match  Metric 

There  are  several  metrics  for  similarity  of  nxn  pixel  features 
of  the  form  A  < 1 ,  j  J  and  B  <  i «  j  ) .  where  i  ranges  from  1  to  n  and 
,l  ranges  from  1  to  n  We  have  utilized: 


Normalized  Correlation 

l  I  A ( i . j ) x B ( i  ,j) 
i  j 

J l  l  A ( i  ,  j )  x A ( i  ,j)  x  J\  l  B(i  ,j)xB(i  ,j) 

'  j  '  j 

Moravec  Correlation  [MQR773 

1  I  A ( i , j ) XB ( i ,  j ) 

'  j 

(((}  [  A(i,j)xA(i,j))  +  ([  I  8 ( i , j) x0  ( i , j ) ) ) /2) 

'  j  '  j 

Normalized  Absolute  Value  Difference 

^  J  l  abs (A ( i , j ) -B { i  ,j))  ^ 

i.o  -  i-i - 

11  A(i  ,  j)  +  ll  B  ( I  ,  j ) 
i  j  >  j 

\  / 

All  of  these  measures  have  a  value  of  1  for  a  perfect  match 
Of  these.  the  first  choice  is  the  most  conventional,  the 
second  a  good  approximation  to  the  first,  and  the  third  is  the 


quickest  to  evaluate 


The  interpolation  process  approximates  the  potential 
displacements  of  a  feature  from  an  initial  image  into  a 
succeeding  image  Depending  on  the  accuracy  required, 
positions  along  the  image  displacement  path  can  be 
approximated  roughly  by  setting  the  coordinates  of  the 
feature's  position  to  the  nearest  integer  value,  or  more 
accurately  by  performing  a  subpixel  interpolation  of  the 
feature  at  each  of  a  set  of  selected  positions  along  the  image 
displacement  path.  The  basic  trade-off  is  between  speed  and 
accuracy,  with  subpixel  interpolation  being  a  more  expensive 
computation. 


3.  3.  Error  Measure 

The  error  measure  associates  with  a  point  on  the  direction  of 
translation  sphere  a  number  describing  the  quality  of  feature 
matches  along  the  image  displacement  paths  determined  by  the 
corresponding  translational  axis  This  error  value  is 
computed  by  first  finding  the  best  match  for  each  feature 
along  a  segment  of  the  image  displacement  path  determined  by 
the  translational  axis  using  one  of  the  normalized  match 
me tr ics  above.  Each  of  these  values  is  then  subtracted  from 
one,  and  all  the  resulting  values  are  added  together  to  form 
an  error  measure.  Thus,  for  a  set  of  N  features  in  an  initial 


image,  a  hypothesized  translational  axis,  and  use  of  one  of 
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the  match  metrics  above,  the  error  measure  is 


n 

'l  ( 1 .0  -  bestmatch(  i ) ) 
i  =  1 


where  b  estmatc  h ( i )  is  the  best  match  value  associated  with 
feature  l  along  the  image  displacement  path  determined  for  it 
by  the  hypothesized  translational  axis. 

The  error  measure  was  computed  in  two  forms  in  the  experiments 
below:  a  fast  evaluation  form  and  a  orec ise  evaluation  form. 
The  fast  form  uses  the  absolute  value  norm  and  the  nearest 
integer  ap p r o x i ma t i on  to  determine  feature  position  along  the 
image  displacement  paths.  The  fast  form  is  useful  for 
evaluating  image  sequences  with  several  extracted  features  to 
determine  the  rough  position  of  the  global  minimum  However, 
the  fast  form  is  not  adequate  for  fine  determination  of  the 
translational  axis  because  it  does  not  vary  smoothly  with 
respect  to  small  changes  in  the  position  of  a  trans 1  at i ona 1 
axis,  due  to  the  nearest  integer  approximation  for  feature 
p os  i  t  i  on. 

The  precise  form  of  evaluation  uses  the  Moravec  norm  and 
bi-linear  i n t er p o la t i on.  It  has  been  found  to  vary  smoothly 
with  respect  to  small  changes  in  the  position  of  a 


trans lat iona 1  axis. 
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3.  4.  Starch  gruniUtiOP 

Th e  starch  proctss  used  here  consists  of  two  phases.  A  global 
sampling  of  the  error  measure  determines  the  rough  shape  of 
the  error  surface<  then  a  local  search  determines  the  minimum 
The  local  search  begins  at  the  position  where  the  minimum 
value  was  determined  by  the  global  sampling.  The  procedure 
used  for  the  local  search  is  steepest  descent  with  a 
diminishing  step— size.  That  is<  the  steepest  descent 
procedure  begins  with  a  initial  fixed  step  size  and  determines 
a  local  minimum  using  it.  The  step-size  is  then  reduced  and 
the  prodedure  repeated  until  the  step-size  is  at  the  desired 
resolution  for  the  determination  of  the  translat ional  axis 
In  the  experiments  below  the  initial  step-size  was  set  to  O  1 
and  then  reduced  successively  to  0.025  and  0.005  radians. 

As  will  be  seen  in  the  following  experiments.  the  error 
measure  is  smooth.  with  a  single  minimum  in  a  large 
neighborhood  around  the  correct  translational  axis.  Thus,  the 
global  sampling  can  be  quite  sparse  or  the  initial  step  sire 
of  the  local  search  quite  large. 
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4  experiments 

The  following  experiments  were  performed  using  the  roadsign 
and  industrial  image  sequences  introduced  earlier.  They  cover 
a  wide  range  of  situations.  The  first  experiment  involves 
determining  the  translational  axis  for  the  industrial  image 
sequence  using  the  features  indicated  in  figure  5e.  In  this 
sequence  the  translational  axis  intersects  the  image  plane  in 
a  visible  portion  of  the  image.  The  second  experiment 
involves  processing  the  industrial  image  sequence  using  a 
small  number  of  features  that  are  positioned  across  the 
initial  image  of  the  sequence.  The  third  experiment  involves 
processing  the  roadsign  image  sequence  using  the  features 
extracted  at  the  positions  indicated  in  figure  4e  from  the 
initial<  unconvolved  image.  In  this  sequence  the  intersection 
of  the  translational  axis  and  the  image  plane  is  not  in  the 
visible  portion  of  the  image.  The  fourth  experiment  involves 
processing  the  roadsign  image  sequence)  but  using  the  features 
extracted  prior  to  1 ow-curvature  suppression.  This  has  the 
effect  of  introducing  weak  and  spurious  features  into  the 
error  measure  computation.  The  fifth  experiment  involves 
processing  the  roadsign  images  using  features  extracted  from  a 
small  area  of  the  initial  image. 

In  all  of  the  experiments)  the  maximal  displacement  along  an 
image  displacement  path  was  set  to  10  pixels.  Displacements 
were  in  increments  of  1  pixel  along  the  image  displacement 
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paths.  Features  mere  7x7  pixel  arrays  centered  at  the 

positions  in  the  indicated  figures 

For  each  experiment!  the  results  of  processing  are  contained 
in  3  tables.  The  first  two  (tables  a  and  b)  indicate  the 
values  of  the  error  measure  during  the  global  sampling  of 
points  using  a  fixed  angular  increment  (equal  to  PI/10  or 
0.31416  radians  or  18  degrees)  on  the  direction  of  translation 
sphere.  The  first  of  these  tables  corresponds  to 

trans lat i ona 1  axes  which  intersect  the  image  plane  at  FUts 
The  second  basically  corresponds  to  those  which  intersect  the 
image  plane  at  FOCs  Recall  that  the  Phil  coordinate 
determines  a  plane  containing  the  X-axis  of  the  camera 
coordinate  system  and  Phi2  refers  to  positions  of  unit  vectors 
in  such  a  plane. 

The  third  table  (table  c)  shows  the  minimal  value  determined 
by  the  global  sampling  process  and  the  successive  values  of 
the  error  measure  determined  during  the  local  search.  In  this 
table#  the  position  of  the  trans 1  a t i onal  axis  is  referred  to 
in  terms  of  (X«Y»Z)  camera  coordinates/  in  addition  to 
(Phil.Phi2)  coordinates.  so  that  trans 1  at i ona 1  axes  computed 
under  different  situations  can  be  compared. 


4.  1.  Industrial  Imaoes 


The  procedure  was  applied  to  the  industrial  images  using  the 
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features  extracted  at  the  positions  shown  in  figure  5e 
Tables  la  and  lb  show  the  global  sampling  of  the  error  measure 
using  the  fast  form  of  evaluation.  Note  the  minima  at 
Phil=l  57080  and  Ph i2— 1.  25660.  Table  lc  shows  the  successive 
values  of  the  local  search  using  the  precise  form  of 
evaluation.  The  determined  translation  axis  is  (-0.  13875, 
-0  09887,  0.98538).  The  Image  displacements  determined  for 

these  features  is  shown  in  figure  7. 


4  2  Industrial  Imaoes  wi th  Selec  ted  Features 

The  procedure  was  again  applied  to  the  industrial  image 
sequence  but  using  features  which  were  selected  by  hand  from 
those  indicated  in  figure  5e.  The  positions  of  these  8 
features  are  shown  in  figure  8. 

Tables  2a  and  2b  show  the  global  sampling  of  the  error  measure 
using  the  precise  form  of  evaluation.  Note  the  minima  at 
Phil=l  57080  and  Phi2=l.  57080.  Table  2c  shows  the  successive 
position  determined  by  the  local  search.  The  determined 
translational  axis  was  <-0.15438,  -0  07896,  0  98485).  This 
corresponds  to  an  angular  difference  of  0.0253  radians  (1.4505 
degrees)  with  respect  to  the  axis  determined  in  experiment  1. 
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Figure  7- 


Image  Displacements  tor  Industrial  Image  Sequence 
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4.  3.  Roadsian  Imao e  Sequence 

The  procedure  uias  applied  to  the  roadsign  image  sequence  usinq 
the  features  extracted  at  the  positions  indicated  in  figure 
4e.  Tables  3a  and  3b  show  the  global  sampling  of  the  en or 
measure  using  the  fast  form  of  evaluation.  Note  the  minima  at 
Phil=2.  51330  and  Phi2=0.  62832  Table  3c  shows  the  successive 
values  of  the  local  search  using  the  precise  form  of 
evaluation  for  the  error  measure.  The  translational  axis 
determined  by  this  process  is  (-0  83738.  -0  42043.  0  34933) 

The  image  displacements  for  the  feature  points  shown  in  fiqure 
4e  consistent  with  this  trans  lational  axis  are  shown  in  figure 
9. 

Given  the  direction  of  translation  and  imaqe  displacements, 
the  relative  environmental  depths  of  image  points  can  be 
recovered  by  the  simple  relation  in  equation  1  When  image 

displacements  are  small/  the  inferred  depth  values  can  be 
quite  erratic  due  to  sensitivity  to  small  numbers  in  the 
denominator  in  the  left  hand  side  of  equation  1  For  this 
reason.  it  is  necessary  to  keep  track  of  the  imaqe 
displacements  over  several  successive  images  with  concurrent 
updating  of  the  inferred  depth  values  This  was  done  usinq  a 
seqi  ence  of  four  successive  images  from  the  roadsign  sequence 
beginning  with  roadsign  images  1  and  2  and  using  the  features 
from  image  1  at  the  positions  in  figure  4e  The  position  of 
the  translational  axis  determined  from  images  T(t>  and  l(t+l> 
was  used  as  the  initial  value  in  the  local  search  for 
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determing  the  translational  axis  -for  images  l(t+l)  and  I(t-*r?> 

The  displacements  of  all  features  along  the  contour  in  figure 
4c  were  determined  along  the  image  displacement  paths 
determined  by  the  tr ans la t i on  a  1  axis  found  for  images  1(1)  and 
1(4).  From  these  d i sp 1 ac ement s  the  depth  values  for  image 
points  along  the  contour  were  computed  using  equation  1 

The  roadsign  sequence  is  particularly  nice  for  presenting 
depth  processing  results  because  the  three  environmental 
objects  in  the  images  are  at  three  distinct  depths  This  is 
shown  in  figure  10a  by  the  three  distinct  clusters  in  the 
histogram  of  the  depth  values  calculated  for  the  points  alonq 
the  contour.  The  units  in  the  histogram  are  cummulative 
time-unti 1-contact  values.  That  is.  the  depth  is  given  in 
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pole),  the  boundary  segment  shown  in  figure  lOd  (the  trees) 
Points  near  the  image  boundary  of  1(1)  were  ignored  because 
the  processing  did  not  take  into  account  occlusion  effects 
along  the  image  boundaries 
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4  4  Road  s  i  on  Seauenc  e  with  Redundant  Features 

The  procedure  was  applied  to  the  roadsign  image  sequence  using 
the  features  which  were  extracted  prior  to  1 ow-c  ur va t  ur e 
suppression.  The  positions  of  these  features  is  shown  in 
figure  4d.  This  has  the  effect  of  including  several  weak  and 
false  features  in  the  evaluation  of  the  error  measure. 

Tables  4a  and  4b  show  the  values  of  the  global  sampling  of  the 
error  measure  using  the  fast  form  of  evaluation.  Note  the 
m i n i ma  at  Phil =2.  51 330  and  Ph i2=0  62832.  Table  4c  shows  the 
successive  values  of  the  local  search.  The  determined 
tr  ans  1  a  t  i  ona  1  axis  was  (—0.  82909,  -0  42281,  0.  36585).  This 


corresponds  to  an  angle  of  O  0186  radians  < 1  0676  degrees) 
with  respect  to  the  axis  determined  in  experiment  3. 
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4.  5.  Roads  ion  Sub  image 


The  procedure  uias  again  applied  to  the  roadsign  image  sequence 
with  features  restricted  to  the  rectangular  area  shown  m 
figure  11  correponding  to  texture  in  the  distant  trees 

Tables  5a  and  5b  show  the  values  of  the  global  samplinq  of  the 
error  measure  using  the  precise  form  of  evaluation  Note  the 
minima  at  Phil=2. 19910  and  Phi2=0  62832.  Table  5c  shows  the 
successive  values  determined  by  the  local  search  The 
translational  axis  is  determined  to  be  (-0.84281.  -0  42928. 
0.32465).  This  corresponds  to  angles  of  0.026/  radians 
<1.  5341  degrees)  and  O.  0439  <2.  5155  degrees),  with  respect  to 
the  trans lat i ona 1  axes  determined  in  experiments  3  and  4 
respectively. 
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Phi  i 

Phi2 

X 

Y 

z 

Error 

2 .  19-1 

0.6  2832 

-0.80902 

-0.39599 

0.97553 

0.059910 

2  .P091 

0.62832 

-0.80902 

-0.39123 

0.93867 

0.059592 

2.  U7v  l 

0.57832 

-0.83738 

-C.  'J2920 

0.33837 

0.059288 

2 . 49«  1 

0.56832 

-0.89281 

-0.92928 

0. 32U65 

0.059269 

Table  5c 


Stepslze 


0.  1 

0.025 

0.005 


5  DISCUSSION 


This  paper  presents  a  simple  and  robust  procedure  far- 
determining  the  direction  of  environmental  motion  and  image 
d i sp lac emen s t s  in  image  sequences  produced  by  a  translating 
sensor.  The  procedure  is  robust  in  several  different  ways 
It  is  resilent  with  respect  to  weak  and  false  features  It  is 
not  dependent  on  identical  features  being  extracted  in 
successive  images  prior  to  matching  It  can  use  a  small 
number  of  features  positioned  across  an  image  surface  or  a 
small  number  of  features  from  a  limited  area  of  the  im^qe 

The  primary  difficulty  with  real-time  implementation  is  the 
expense  of  performing  the  interpixel  interpolation  for  fine 
resolution  and  of  performing  correlations  on  features  which 
are  arrays  of  pixels.  The  computational  limitation  is  the 
speed  with  which  a  feature  can  determine  and  evaluate  its 
potential  matches  given  a  specified  trans lat  1  ona 1  axis  These 
computations  can  be  carried  out  in  parallel  amoung  the 
individual  features.  The  expense  can  be  lessened  somewhat 
through  the  use  of  higher  resolution  images  (to  lessen  the 
need  for  interpixel  inter p o 1  at  ion > »  use  of  the  absolute  value 
norm#  and  by  sampling  images  at  a  rate  sufficient  to  limit  the 
extent  of  feature  displacements  between  images.  Additionally, 
if  the  direction  of  sensor  translation  is  changing  slowly, 
information  from  processing  preceeding  images  can  be  used  to 
speed  up  the  procedure  when  applied  to  succeeding  images  It 
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is  also  possible  to  modify  the  procedure  so  that  feautres  are 
extracted  from  both  images  and  correlations  are  performed  only 
at  the  positions  where  features  have  been  extracted  in  the 
succeeding  image.  This  would  probably  affect  the  robustness 
and  resolution  of  the  procedure.  We  hope  that  real-time 
implementation  will  eventually  be  possible  with  special 
purpose,  highly  parallel  architectures. 

In  the  remainder  of  this  section,  we  review  some  specific 
aspects  of  the  procedure  and  outline  some  potential  extensions 
and  app 1 ications. 

5  1  Feature  Ex  trac  t ion 

Since  the  procedure's  performance  is  does  not  degrade  severely 
due  to  the  occurrance  of  poor  features,  the  type  of  feature 
extraction  used  in  probably  not  critical.  Nonetheless,  the 
feature  extraction  process  used  here  could  be  extended  in  many 
ways  The  low-curvature  suppression  could  take  into  account 
boundary  length  along  a  contour  between  distinctiveness  maxima 
to  determine  whether  to  suppress  or  generate  a  feature  for 
further  processing.  It  may  also  be  possible  to  determine 
points  of  high  curvature  along  the  boundary  with  out  having  to 
walk  along  the  contour  by  using  operators  which  can  directly 
measure  curvature  CKITC801. 

Another  useful  extension  would  be  to  use  information 


determined  from  the  extraction  of  the  translational  axis  to 
isolate  false  features.  This  could  involve  removing  those 
features  which  have  weak  matches  from  the  error  measure 
calculation  once  a  trans la ti o na 1  axis  has  been  determined  and 
re— evaluating.  Alternatively,  the  depth  inferences  could  be 
used  to  isolate  the  positions  of  potential  false  featuresby 
noting  discontinuities  in  depth  along  an  extracted  contour 
Extracted  features  could  be  removed  from  the  re— eva 1 uat 1  on  of 
the  error  measure  if  they  are  at  or  near  such  positions 
Another  type  of  feature  which  can  affect  the  evaluation  of  the 
error  measure  are  those  near  an  FOE  or  FOC  which  is  contained 
in  a  visible  portion  of  the  image.  Such  features  tend  to  move 
very  small  amounts  along  their  image  displacement  paths  and 
hence  require  fine  interpol ation  to  determine  their  best 
matches. 

5.  2.  Search  Process 

5-2.  1.  Prop er ties  o f  the  Err  or  fleas ure 

In  the  experiments/  the  error  measure  has  a  distinct  global 
minimum  at  the  point  on  the  unit  sphere  corresponding  to  the 
correct  translational  axis.  It  is  expected  to  have  such 
behavior  generally  because  it  is  very  unlikely  that 
translational  axes  that  are  far  from  the  correct  position  will 
define  image  displacement  paths  that  simultaneously  all ow  good 
matches  for  many  features.  Thus  competing  candidates  for  the 
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global  minimum  should  not  be  widely  separated. 

The  error  measure  is  affected  by  both  non-d i s t inc t i ve  and 
false  features.  Non-dist inc ti ve  features  will  match  well  for 
many  different  trans la t i ona  1  axes.  Large  numbers  of  these 
weak  features  will  flatten  the  response  of  the  error  measure 
false  features  will  also  distort  the  error  measure  since  they 
will  often  have  their  best  matches  with  incorrect 

tr ans la t i ona  1  axes. 

The  effects  of  these  poor  features  should  be  compensated  by 
the  agreement  of  good  features.  Every  one  of  the  good 
features  will  tend  to  have  a  bad  match  for  the  incorrect 
trans lat i onal  axis  and  their  unanimity  is  expected  to  overide 
the  lack  of  discrimination  of  weak  features  and  the  random 
quality  of  the  matches  of  false  features. 


5  2.  2.  Uti litu  of  the  Direct  ion  of  Translation  Sphere 


There  are  significant  advantages  in  defining  the  error  measure 
with  respect  to  a  unit  sphere*  instead  of  the  potential 
positions  of  FQEs  and  FQCs  in  the  image  plane.  The  sphere  is 
a  bounded  surface  which  makes  uniform  global  sampling  of  the 
error  measure  feasible.  Additionally*  the  resolution  in  the 
position  of  the  translati ona  1  axis  varies  accross  the  surface 
of  the  image  plane.  For  example*  the  FOEs  determined  by 
translational  axes  separated  by  very  small  angles  will  be 


A, 


f  1 


seperated  by  larger  and  larger  distances  in  the  plane  as  the 
intersections  of  the  tr ans la t i ona 1  axes  and  the  image  plane 
are  placed  further  from  the  visible  image.  The  effect  of  this 
on  the  error  measure/  when  it  is  defined  over  the  image  plane, 
is  large  flat  areas  for  FOEs  further  from  the  visible  portions 
of  the  image.  Finally.  special  criteria  must  be  used  to 
distinguish  between  FOEs  and  FOCs  if  the  error  measure  is 
defined  relative  to  the  image  plane.  Roughly  parallel  image 
displacements  could  correspond  to  an  FOE  off  to  one  side  of 
the  image  plane  or  to  an  FOC  off  to  the  opposite  side  Or.  the 
direction  of  translation  sphere,  the  corresponding 
trans lat i ona 1  axes  would  be  close  while  on  the  plane  they  are 
completely  separated. 


5.  2.  3.  Opt  imi  lat  ion  Procedure 

The  optimization  procedure  used  here  is  very  simple,  and, 
because  of  the  strong  unimodality  of  the  error  measure  and  its 
smoothness,  other  techniques  with  more  rapid  convergence  could 
be  used.  It  is  interesting  to  note,  however,  that  the  global 
component  of  the  optimization  performed  here  is  an  instance  of 
a  generalized  Hough  Transform  E BAL81 , 0 'ROUB1 D  inwhich  each 
feature  scales  its  vote  for  a  particular  trans 1  at i ona  1  axis  by 
the  best  match  it  can  find  consistent  with  the  translational 


axis. 


5  3 


extensions  and  Add  1  i  cat  i ons 


5  3  1  Qth  er  Cases  o  f  Restricted  Motion 

The  procedure  developed  in  this  paper  is  applicable  to  other 
cases  of  unknown  but  restricted  camera  motions  for  which  it  is 
computationally  feasible  to  search  directly  through  a  subspace 
of  the  camera  motion  parameters  to  determine  feature  matches. 
Two  particular  cases  are  pure  sensor  rotation  and  motion 
constrained  to  a  known  plane 

For  pure  sensor  rotation.  there  are  three  unknown  camera 
parameters.  Two  for  the  axis  of  rotation  and  one  for  the 
extent  of  rotation  about  the  axis.  In  this  case.  the  error 
measure  would  be  defined  with  respect  to  a  unit  sphere  inwhich 
each  point  corresponds  to  an  axis  of  rotation.  For  each 
rotational  axis,  the  extent  of  displacement  For  image  features 
is  determined  by  different  rotations  about  the  axis.  There  is 
the  additional  constraint  in  the  rotational  case  that  the 
displacements  of  all  features  must  correspond  to  the  same 
extent  of  rotation. 

During  arbitrary  sensor  motion  relative  to  a  stationary 
environment.  the  image  motion  due  to  distant  environmental 
points  is  primarily  due  to  the  rotational  component  of  sensor 
motion  Sensor  rotation  can  be  recovered  by  applying  the 
observer  rotation  processing  procedure  to  the  images  of  such 
distant  points  The  rotation  can  then  be  subtracted  out  to 


yield  successive  images  related  by  sensor 


on  i  u 


translatio n 

These  resulting  images  can  then  be  processed  by  the  technique 
here. 


5  3.  2.  Multiple  Independently  Moving  Objects 

The  processing  here  has  been  limited  to  a  camera  movinq 
relative  to  a  stationary  environment)  or  a  stationary  camera 
with  a  stationary  background  and  a  single  moving  object  A 
useful  extension  would  allow  for  several  independently  movinq 
objects  with  different  directions  of  translation  I  h  e 
technique  of  summation  of  errors  in  feature  matching  only 
allows  a  single  axis  of  translation  to  be  determined  and  will 
cause  the  analysis  of  the  several  objects  in  independent 
motion  to  be  confounded.  Due  to  the  similarity  of  the  global 
search  and  a  generalized  Hough  transform  noted  above,  the 
suggested  techniques  for  decomposing  generalized  Hough 
transforms  into  constituient  objects  havinq  different 
parameter  values  CADI82,  BAL81,  Q'R0U81J  may  be  applicable 

Another  approach  is  to  segment  an  image  into  regions  which 
potentially  correspond  to  objects,  or  to  arbitrarily  divide 
the  image  into  regular  overlapping  subimages  and  perform  the 
tr ans la t i ona 1  analysis  for  each  region  or  subimage 
independently  CWIL80,  NAG79 ] .  experiments  have  shown  it  is 
possible  to  work  with  small  image  areas,  at  a  size  comparable 
to  extracted  regions  or  subimape  areas,  and  still  determine 
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th<=  axis  of  translation  with  a  reasonable  level  of  precision 
If  objects  with  similar  translations  correspond  to  several 
different  regions  or  image  subareas*  then  similar 
tran jlational  axes  will  be  determined  for  these  regions  or 
subimages  If  objects  with  different  translations  correspond 
to  the  same  regions  or  subimages  then  there  will  be  poor* 
indistinct  error  values  for  the  error  function  For  this 
second  case,  it  is  necessary  to  resegment  and  redetermine  a 
translational  axis 


5  3  3  Stab i 1 i 2 ed  Retina 

Translational  processing  is  sufficient  for  vision-based 
navigation  in  a  stationary  environment  if  the  orientation  of 
the  optic  sensor  can  be  fixed  relative  to  the  environment  over 
time  In  this  case,  sensor  motion  amounts  to  a  sequence  of 
translations  in  possibly  different  directions  over  time. 

A  difficulty  with  such  a  stabilized  retina  is  that  much  of  the 
environment  would  not  be  observable  This  can  be  corrected  by 
using  a  set  of  such  stabilized  retinas  arranged  to  yield  a 
complete  view  of  space.  There  would  then  be  no  need  to  rotate 
the  sensor  to  view  a  particular  environmental  point.  A 
possible  arrangement  of  retinal  surfaces  is  a  cubical  one 
One  of  the  retinal  planes  will  always  contain  an  FOE  and 
another  will  always  contain  an  FOC  (unless  the  direction  of 
motion  is  right  on  an  edge  of  the  cube  and  the  focal  length 


has  not  been  properly  adjusted) 


There  will  also  be  several 


independent  estimates  of  the  directon  of  translation  which  c  .< n 

be  integrated 


5.  3  4.  Th  e  Local  Translational  Dec  omo  osition 

This  technique  can  be  extended  to  less  restricted  forms  of 
sensor  motion  by  applying  the  procedure  for  translstioml 
motion  to  small,  overlapping  areas  across  an  image  sut- fai  e 
over  a  sequence  of  images.  This  approximates  more  qeneral 
motions  as  consisting  locally  of  environmental  translations 
and  interpets  local  image  motion  as  resultinq  from 
environmental  translations.  Ihe  feasibility  of  this  is  based 
upon  experiments  showing  that  the  direction  of  translation  can 
be  extracted  with  reasonable  precision  using  small  image  areas 
containing  a  few  features.  The  resulting  description 
associates  with  a  set  of  image  points  Cor  small  imaqe  areas' 
the  approximated  direction  of  motion  of  the  corresponding 
environmental  points  Cor  small  environmental  surface  area' 
As  a  low  level  representation  of  environmental  motion,  this 
can  considerably  simplify  the  recovery  of  the  sensor  motion 
parameters  CLAW821  It  can  also  provide  qualitative 


information  concerning  the  rough  direction  of  motion  of 
objects  in  a  scene 
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20.  ABSTRACT  (Continue  on  revere e  tide  it  noceeeery  and  Identify  by  block  member) 

This  paper  presents  a  procedure  for  processing  real  world  imago 
sequences  produced  by  relative  translational  motion  between  a  sensor  and 
environmental  objects.  In  this  procedure,  the  determination  of  the 
direction  of  sensor  translation  is  effectively  combined  with  the  determina¬ 
tion  of  the  displacements  of  image  features  and  environmental  depth. 
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It  requires  no  restrictions  on  the  direction  of  motion,  nor  the  location 
and  shape  of  environmental  objects.  It  has  been  applied  successfully  to 
real-world  image  sequences  from  several  different  task  domains. 

The  processing  consists  of  two  basi  eps :  Feature  Extraction  and 
Search.  The  feature  extraction  process  p  -i:s  out  small  image  areas  which 
may  correspond  to  distinguishing  parts  of  environmental  objects.  The 
direction  of  translational  motion  is  then  found  by  a  search  which 
determines  the  image  displacement  paths  along  which  a  measure  of  feature 
mismatch  is  minimized  for  a  set  of  features.  The  correct  direction  of 
translation  will  minimize  this  error  measure  and  also  determine  the 
corresponding  image  displacement  paths  for  which  the  extracted  features 
match  well. 
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